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Abstract. This is an article for a general mathematical audience on the author's 
work, joint with Terence Tao, establishing that there are arbitrarily long arithmetic 
progressions of primes. 



1. INTRODUCTION AND HISTORY 

This is a description of recent work of the author and Terence Tao [11 J on primes 
in arithmetic progression. It is based on seminars given for a general mathematical 
audience in a variety of institutions in the UK, France, the Czech Republic, Canada 
and the US. 

Perhaps curiously, the order of presentation is much closer to the order in which we 
discovered the various ingredients of the argument than it is to the layout in [TT]. We 
hope that both expert and lay readers might benefit from contrasting this account with 
[TT] as well as the expository accounts by Kra jTH] and Tao (2E1 EH] • 

As we remarked, this article is based on lectures given to a general audience. It was often 
necessary, when giving these lectures, to say things which were not strictly speaking true 
for the sake of clarity of exposition. We have retained this style here. However, it being 
undesirable to commit false statements to print, we have added numerous footnotes 
alerting readers to points where we have oversimplified, and directing them to places in 
the literature where fully rigorous arguments can be found. 

Our result is: 

Theorem 1.1 (G.-Tao). The primes contain arbitrarily long arithmetic progressions. .□ 

Let us start by explaining that the truth of this statement is not in the least surprising. 
For a start, it is rather easy to write down a progression of five primes (for example 
5, 11, 17, 23, 29), and in 2004 Frind, Jobling and Underwood produced the example 

56211383760397 + 44546738095860A;; k = 0,1, ... ,22. 

of 23 primes in arithmetic progression. A very crude heuristic model for the primes may 
be developed based on the prime number theorem, which states that tt(N), the number 
of primes less than or equal to N, is asymptotic to N/ log N. We may alternatively 
express this as 

F(x is prime | 1 ^ x ^ N) ~ 1/ log N. 

Consider now the collection of all arithmetic progressions 

x, x + d, . . . , x + (k — l)d 

with x,d G {1, . . . , iV}. Select x and d at random from amongst the N 2 possible choices, 
and write Ej for the event that x+jd is prime, for j — 0, 1, . . . , k — 1. The prime number 
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theorem tells us that 

F(Ej) « 1/logJV. 
If the events Ej were independent we should therefore have 

fc-i 

P{x,x + d, . . . ,x + (k — l)d are all prime) = P( /\ Ej) « l/(logA^) fc . 

3=0 

We might then conclude that 

AT2 



d G {!,..., AT} : x, x + d, . . . , x + (k — l)d are all prime } 



(logiV) fc ' 

For fixed k, and in fact for k nearly as large as 2 log N/ log log N, this is an increasing 
function of N. This suggests that there are infinitely many fc-term arithmetic progres- 
sions of primes for any fixed k, and thus arbitrarily long such progressions. 

Of course, the assumption that the events Ej are independent was totally unjustified. 
If E , Ei and E 2 all hold then one may infer that x is odd and d is even, which increases 
the chance that E% also holds by a factor of two. There are, however, more sophisticated 
heuristic arguments available, which take account of the fact that the primes > q fall 
only in those residue classes a(modg) with a coprime to q. There are very general 
conjectures of Hardy-Littlewood which derive from such heuristics, and a special case of 
these conjectures applies to our problem. It turns out that the extremely naive heuristic 
we gave above only misses the mark by a constant factor: 



Conjecture 1.2 (Hardy-Littlewood conjecture on fc-term APs). For each k we have 

IkN 2 
(log AO' 

where 



d G {!,..., N} : x, x + d, . . . , x + (k — l)d are all prime } = — - (1 + o(l)), 



p 



is a certain product of "local densities" which is rapidly convergent and positive. 
We have 



„(*0 ) P v 

a p = \ / . \ / Nfc-1 



In particular we compute 1 



and 



^= 2 n( i -7A^)" 1 - 32032 



9 -i-r / 3» — 1 N 

* = £ II C 1 " 7^TTp ) ^ 2 - 85825 - 

What we actually prove is a somewhat more precise version of Theorem 1 which gives 
a lower bound falling short of the Hardy-Littlewood conjecture by just a constant factor. 



1 For a tabulation of values of 7*,, 3 ^ k ^ 20, see |16|. As k — > oo, log7fc ~ fcloglogfc. 
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Theorem 1.3 (G.-Tao). For each k ^ 3 there is a constant j' k > such that 

i k N 2 



d e {1, . . . , N} : x, x + d, . . . , x + (k — l)d are all prime } ^ 



(logiV) fc 

for all N > N (k). □ 
The value of 7^ we obtain is very small indeed, especially for large k. 

Let us conclude this introduction with a little history of the problem. Prior to our 
work, the conjecture of Hardy-Littlewood was known only in the case k = 3, a result 
due to Van der Corput (HO] (see also 0) in 1939. For k ^ 4, even the existence of 
infinitely many £;-term progressions of primes was not previously known. A result of 
Heath-Brown from 1981 ^JJ comes close to handling the case k = 4; he shows that 
there are infinitely many 4-tuples qi < q<i < q% < q^ in arithmetic progression, where 
three of the q$ are prime and the fourth is either prime or a product of two primes. This 
has been described as "infinitely many 3|-term arithmetic progressions of primes". 

2. THE RELATIVE SZEMEREDI STRATEGY 

A number of people have noted that ^1] manages to avoid using any deep facts about 
the primes. Indeed the only serious number-theoretical input is a zero-free region for ( 
of "classical type" , and this was known to Hadamard and de la Vallee Poussin over 100 
years ago. Even this is slightly more than absolutely necessary; one can get by with the 
information that ( has an isolated pole at 1 [2*7] . 

Our main advance, then, lies not in our understanding of the primes but rather in what 
we can say about arithmetic progressions. Let us begin this section by telling a little of 
the story of the study of arithmetic progressions from the combinatorial point of view 
of Erdos and Turan [1] . 



Definition 2.1. Fix an integer k ^ 3. We define rfc(iV) to be the largest cardinality 
of a subset A C {1, . . . , N} which does not contain k distinct elements in arithmetic 
progression. 

Erdos and Turan asked simply: what is rk(N)7 To this day our knowledge on this 
question is very unsatisfactory, and in particular we do not know the answer to 

Question 2.2. Is it true that r k (N) < n(N) for N > N {k)l 

If this is so then the primes contain k-term arithmetic progressions on density grounds 
alone, irrespective of any additional structure that they might have. I do not know 
of anyone who seriously doubts the truth of this conjecture, and indeed all known 
lower bounds for Tk(N) are much smaller than 7r(iV). The most famous such bound is 
Behrend's assertion |T] that 

r 3 (N) > AT e - c v /I ^ ; 
slightly superior lower bounds are known for rfe(iV), k ^ 4 (cf. |2lH I22j). 

The question of Erdos and Turan became, and remains, rather notorious for its difficulty 
It soon became clear that even seemingly modest bounds should be regarded as great 
achievements in combinatorics. The first really substantial advance was made by Klaus 
Roth, who proved 
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Theorem 2.3 (Roth, [23 ). We have r 3 (N) < AT (log logiV) -1 . □ 

The key feature of this bound is that log log N tends to infinity with N, albeit slowly 2 . 
This means that if one fixes some small positive real number, such as 0.0001, and then 
takes a set A C {1, . . . , N} containing at least O.OOOliV integers, then provided N is 
sufficiently large this set A will contain three distinct elements in arithmetic progression. 

The generalisation of this statement to general k remained unproven until Szemeredi 
clarified the issue in 1969 for k = 4 and then in 1975 for general k. His result is one of 
the most celebrated in combinatorics. 

Theorem 2.4 (Szemeredi jSUEH]). We have rk(N) = o(N) for any fixed k ^ 3. □ 

Szemeredi's theorem is one of many in this branch of combinatorics for which the bounds, 
if they are ever worked out, are almost unimaginably weak. Although it is in principle 
possible to obtain an explicit function u k (N), tending to zero as iV — > oo, for which 

r k (N) ^ u k (N)N, 

to my knowledge no-one has done so. Such a function would certainly be worse than 
1 / log,,, N (the number of times one must apply the log function to N in order to get a 
number less than 2), and may even be slowly-growing compared to the inverse of the 
Ackermann function. 

The next major advance in the subject was another proof of Szemeredi's theorem by 
Furstenberg [5]. Furstenberg used methods of ergodic theory, and his argument is rel- 
atively short and conceptual. The methods of Furstenberg have proved very amenable 
to generalisation. For example in pj Bergelson and Leibman proved a version of Sze- 
meredi's theorem in which arithmetic progressions are replaced by more general config- 
urations (x +pi(d), . . . , x + Pk(d)), where the pi are polynomials with Pi(Z) C Z and 
Pi{0) = 0. A variety of multidimensional versions of the theorem are also known. A 
significant drawback 3 of Furstenberg' s approach is that it uses the axiom of choice, and 
so does not give any explicit function u k (N). 

Rather recently, Gowers [HI E] made a major breakthrough in giving the first "sensible" 
bounds for r k (N). 

Theorem 2.5 (Gowers). Let k ^ 3 be an integer. Then there is a constant c k > such 
that 

r k (N) < iV(loglogiV)- Cfe . □ 

This is still a long way short of the conjecture that r k (N) < tt(N) for N sufficiently large. 
However, in addition to coming much closer to this bound than any previous arguments, 
Gowers succeeded in introducing methods of harmonic analysis to the problem for the 
first time since Roth. Since harmonic analysis (in the form of the circle method of 
Hardy and Littlewood) has been the most effective tool in tackling additive problems 
involving the primes, it seems fair to say that it was the work of Gowers which first gave 
us hope of tackling long progressions of primes. The ideas of Gowers will feature fairly 

2 cf. the well-known quotation "log log log iV has been proved to tend to infinity with N, but has 
never been observed to do so" . 

3 A discrete analogue of Furstenberg's argument has now been found by Tao (2E1- It does give an 
explicit function u)k(N), but once again it tends to zero incredibly slowly. 
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substantially in this exposition, but in our paper ^1] much of what is done is more in 
the ergodic-theoretic spirit of Furstenberg and of more recent authors in that area such 
as Host-Kra [H] and Ziegler 03- 

To conclude this discussion of Szemeredi's theorem we mention a variant of it which is 
far more useful in practice. This applies to functions 4 f : Z/7VZ — > [0, 1] rather than just 
to (characteristic functions of) sets. It also guarantees many arithmetic progressions of 
length k. This version does, however, follow from the earlier formulation by some fairly 
straightforward averaging arguments due to Varnavides pH] . 

Proposition 2.6 (Szemeredi's theorem, II). Let k ^ 3 be an integer, and let S G (0, 1] 

be a real number. Then there is a constant c(k, S) > such that for any function 
f : Z/iVZ — > [0, 1] with Kf = 5 we have the bound? 

%c,dez/Nzf(x)f{x + d) ... f(x + (k- l)d) > c(k, 5). □ 

We do not, in [TT], prove any new bounds for rk(N). Our strategy is to prove a relative 
Szemeredi theorem. To describe this we consider, for brevity of exposition, only the 
case k = 4. Consider the following table. 



Szemeredi 


Relative Szemeredi 


{1,...,N} 


? 


AQ{1,...,N} 
\A\ ^ O.OOOliV 


V N 

= primes ^ iV 


Szemeredi's theorem: 
A contains many 4-term APs. 


Green-Tao theorem: 
Vn contains many 4-term APs. 



On the left-hand side of this table is Szemeredi's theorem for progressions of length 4, 
stated as the result that a set A C {1, ... , N} of density 0.0001 contains many 4-term 
APs if N is large enough. On the right is the result we wish to prove. Only one thing is 
missing: we must find an object to play the role of {1, ... , N}. We might try to place 
the primes inside some larger set V' N in such a way that \Vn\ ^ 0.0001|P^|, and hope 
to prove an analogue of Szemeredi's theorem for V' N . 

A natural candidate for V' N might be the set of almost primes; perhaps, for example, 
we could take V' N to be the set of integers in {1, ... , N} with at most 100 prime factors. 
This would be consistent with the intuition, coming form sieve theory, that almost 
primes are much easier to deal with than primes. It is relatively easy to show, for 
example, that there are long arithmetic progressions of almost primes [TK] . 



When discussing additive problems it is often convenient to work in the context of a finite abelian 
group G. For problems involving {1, . . . , N} there are various technical tricks which allow one to work 
in Z/N'Z, for some N' w N. In this expository article we will not bother to distinguish between 
{1, . . . , N} and Z/iVZ. For examples of the technical trickery required here, see ^[ Definition 9.3], or 
the proof of Theorem 2.6 in 

5 We use this very convenient conditional expectation notation repeatedly. HL X £Af(x) is defined to 
equal \A\-^ xeA f(x). 
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This idea does not quite work, but a variant of it does. Instead of a set V' N we instead 
consider what we call a measure 6 v : {1,...,N} — > [0,oo). Define the von Mangoldt 
function A by 

. , . _ f logp if n = p k is prime 
^ ' 1 otherwise. 

The function A is a weighted version of the primes; note that the prime number theorem 
is equivalent to the fact that Ei^ n ^vA(n) = 1 + o(l). Our measure v will satisfy the 
following two properties. 

(i) [y majorizes the primes) We have A(n) ^ 10000z/(n) for all 1 ^ n ^ N. 

(ii) (primes sit inside v with positive density) We have Ei^„^tv z/(n) = 1 + o(l). 

These two properties are very easy to satisfy, for example by taking v = A, or by taking 
v to be a suitably normalised version of the almost primes. Remember, however, that 
we intend to prove a Szemeredi theorem relative to v. In order to do that it is reasonable 
to suppose that v will need to meet more stringent conditions. The conditions we use 
in jTTj are called the linear forms condition and the correlation condition. We will not 
state them here in full generality, referring the reader to §3] for full details. We 
remark, however, that verifying these conditions is of the same order of difficulty as 
obtaining asymptotics for, say, 

v{n)v{n + 2). 

For this reason there is no chance that we could simply take v — A, since if we could 
do so we would have solved the twin prime conjecture. 

We call a measure v which satisfies the linear forms and correlation conditions pseudo- 
random. 

To succeed with the relative Szemeredi strategy, then, our aim is to find a pseudorandom 
measure v for which conditions (i) and (ii) and the are satisfied. Such a function 7 comes 
to us, like the almost primes, from the idea of using a sieve to bound the primes. The 
particular sieve we had recourse to was the A 2 -sieve of Selberg. Selberg's great idea was 
as follows. 

Fix a parameter R, and let A = (Xd)d=i be any sequence of real numbers with Ai = 1. 
Then the function 

a x (n) := (^A,) 2 

d\n 

majorizes the primes greater than R. Indeed if n > R is prime then the truncated 
divisor sum over d\n, d ^ R contains just one term corresponding to d — 1. 

^Actually, v is just a function but we use the term "measure" to distinguish it from other functions 
appearing in our work. 

^Actually, this is a lie. There is no pseudorandom measure which majorises the primes themselves. 
One must first use a device known as the VF-trick to remove biases in the primes coming from their 
irregular distribution in residue classes to small moduli. This is discussed in fJ3] 
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Although this works for any sequence A, some choices are much better than others. If 
one wishes to minimise 

n<^N 

then, provided that R is a bit smaller than y/N, one is faced with a minimisation 
problem involving a certain quadratic form in the A^s. The optimal weights A^ EL , 
Selberg's weights, have a slightly complicated form, but roughly we have 

A SEL ^ \ GY ._ (d) hg(R/d) 

where fi(d) is the Mobius function. These weights were considered by Goldston and 
Yildinm [7. in some of their work on small gaps between primes (and earlier, in other 
contexts, by others including Heath-Brown). It seems rather natural, then, to define a 
function v by 

log iV n < R 

2 



*»>== i^(E^ Y ) »>* 



d\n 
ds^R 

The weight 1/ logi? is chosen for normalisation purposes; if R < N 1 ^ 2 ^ e for some e > 
then we have ~Ei^ n <^ N v(n) = 1 + o(l). 

One may more-or-less read out of the work of Goldston and Yildinm a proof of properties 
(i) and (ii) above, as well as pseudorandomness, for this function v. One requires that 
R < N c where c is sufficiently small. These verifications use the classical zero-free region 
for the ^-function and classical techniques of contour integration. 

Goldston and Yildinm's work was part of their long-term programme to prove that 

liminf^oo-^ — — = 0, (2.1) 
log n 

where p n is the nth prime. We have recently learnt that this programme has been 
successful. Indeed together with J. Pintz they have used weights coming from a higher- 
dimensional sieve in order to establish (j2.1|) . It is certain that without the earlier 
preprints of Goldston and Yildinm our work would have developed much more slowly, 
at the very least. 

Let us conclude this section by remarking that v will not play a great role in the 
subsequent exposition. It plays a substantial role in jllj . but in a relatively non-technical 
exposition like this it is often best to merely remark that the measure v and the fact 
that it is pseudorandom is used all the time in proofs of the various statements that we 
will describe. 

3. PROGRESSIONS OF LENGTH THREE AND LINEAR BIAS 

Let G be a finite abelian group with cardinality N. If fi, . . . , fk : G — > C are any 
functions we write 



T k (h, . . . , f k ) := Vc/i(x)/2(x + d) . . . f k {x + (k — l)d) 
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for the normalised count of fc-term APs involving the /j. When all the are equal to 
some function /, we write 

T k (f) : T k (f /). 

When / is equal to 1 A , the characteristic function of a set A C G, we write 

T k (A) :=T k (l A )=T k (l A ,...,l A ). 

This is simply the number of fc-term arithmetic progressions in the set A, divided by 
N 2 . ' ' 

Let us begin with a discussion of 3-term arithmetic progressions and the trilinear form 
T 3 . If A C G is a set, then clearly T 3 (A) may vary between (when A = 0) and 1 
(when A = G). If, however, one places some restriction on the cardinality of A then 
the following question seems natural: 

Question 3.1. Let a G (0, 1), and suppose that A C G is a set with cardinality aN. 
What is T 3 (A)1 

To think about this question, we consider some examples. 

Example 1 (Random set). Select a set A C G by picking each element x G G to lie in 
yl independently at random with probability a. Then with high probability \A\ « aiV. 
Also, if d ^ 0, the arithmetic progression (x, x + d, x + 2d) lies in G with probability 
a 3 . Thus we expect that T 3 (A) ~ a 3 , and indeed it can be shown using simple large 
deviation estimates that this is so with high probability. 

Write Es(a) := a 3 for the expected normalised count of three-term progressions in the 
random set of Example 1. One might refine Question 13.11 by asking: 

Question 3.2. Let a G (0, 1), and suppose that A C G is a set with cardinality aN. 
Is T 3 (A) w E 3 (a)7 

It turns out that the answer to this question is "no" , as the next example illustrates. 

Example 2 (Highly structured set, I). Let G = Z/iVZ, and consider the set A = 
{1, . . . , |_a^J}, an interval. It is not hard to check that if a < 1/2 then T 3 (A) w \o> 2 , 
which is much bigger than E 3 (a) for small a. 

These first two examples do not rule out a positive answer to the following question. 

Question 3.3. Let a G (0, 1), and suppose that A C G is a set with cardinality aN. 
Is T 3 (A) > E 3 {a)l 

If this question did have an affirmative answer, the quest for progressions of length 
three in sets would be a fairly simple one (the primes would trivially contain many 
three-term progressions on density grounds alone, for example). Unfortunately, there 
are counterexamples. 

Example 3 (Highly structured set, II). Let G = Z/iVZ. Then there are sets A C G 
with \A\ — [aN\ , yet with T 3 (A) a 10000 . We omit the details of the construction, 



LONG ARITHMETIC PROGRESSIONS OF PRIMES 



9 



remarking only that such sets can be constructed as unions of intervals of length ^> a N 
in Z/iVZ. 

Our discussion so far seems to be rather negative, in that our only conclusion is that none 
of Questions 13. 11 l3~2l and l3~31 have particularly satisfactory answers. Note, however, that 
the three examples we have mentioned are all consistent with the following dichotomy. 

Dichotomy 3.4 (Randomness vs Structure for 3-term APs). Suppose that ACG has 
size aN. Then either 

• T 3 (A) « E 3 (a) or 

• A has structure. 

It turns out that one may clarify, in quite a precise sense, what is meant by structure 
in this context. The following proposition may be proved by fairly straightforward 
harmonic analysis. We use the Fourier transform on G, which is defined as follows. If 
/ : G — > C is a function and 7 G G a character (i.e. a homomorphism from G to C x ), 
then 

/ A ( 7 ) :=E xeG f(xHx). 

Proposition 3.5 (Too many/few 3APs implies linear bias). Let a,rj G (0,1). Then 
there is c(a, if) > with the following property. Suppose that A C G is a set with 
\A\ = aN, and that 

\T 3 (A)-E 3 (a)\>r). 

Then there is some character 7 G G with the property that 

|(U-a) A ( 7 )| >c(a,r,). □ 

Note that when G = Z/iVZ every character 7 has the form 7(2;) = e(rx/N). It is the 
occurrence of the linear function x i— > rx/N here which gives us the name linear bias. 

It is an instructive exercise to compare this proposition with Examples 1 and 2 above. 
In Example 2, consider the character 7(2) = e(x/N). If a is reasonably small then all 
the vectors e(x/N), x & A, have large positive real part and so when the sum 

(U - a) A ( 7 ) = E x&/NZ l A (x)e(x/N) 

is formed there is very little cancellation, with the result that the sum is large. 

In Example 1, by contrast, there is (with high probability) considerable cancellation in 
the sum for (1^ — a) A (7) for every character 7. 

4. LINEAR BIAS AND THE PRIMES 

What use is Dichotomy 13.41 for thinking about the primes? One might hope to use 
Proposition 13.51 in order to count 3-term APs in some set A C G by showing that A 
does not have linear bias. One would then know that T 3 (A) « E 3 (a), where \A\ — aN. 



8 Basically one considers a set S C Z 2 formed as the product of a Behrend set in {1, ... , M} and the 
interval {1, . . . , L}, for suitable M and L, and then one projects this set linearly to Z/NZ. 
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Let us imagine how this might work in the context of the primes. We have the following 
proposition 9 , which is an analogue of Proposition l3~51 In this proposition 10 , v : Z/iVZ — > 
[0,oo) is the Goldston-Yildinm measure constructed in £0 

Proposition 4.1. Let a, 77 G (0, 2}. Then there is c(a, if) > with the following propety. 
Let f : Z/iVZ ->Kka function with E/ = a and such that ^ f(x) 10000z/(x) /or 
a// x G Z/iVZ, and suppose that 

\T 3 (f)-E 3 (a)\^ V . 

Then 

\E x&/NZ (f(x)-a)e(rx/N)\ > c(a,r}) (4.1) 
/or some r G Z/iVZ. □ 

This proposition may be applied with / = A and a = 1 + o(l). If we could rule out 
(|4.1|) . then we would know that 73(A) « £"3(1) = 1, and would thus have an asymptotic 
for 3-term progressions of primes. 

Sadly, (J4.1)) does hold. Indeed if N is even and r = N/2 then, observing the most primes 
are odd, it is easy to confirm that 

E x£Z/m (A(x) - l)e{rx/N) = -1 + o(l). 

That is, the primes do have linear bias. 

Fortunately, it is possible to modify the primes so that they have no linear bias using 
a device that we refer to as the W-trick. We have remarked that most primes are odd, 
and that as a result A — 1 has considerable linear bias. However, if one takes the odd 
primes 

3,5,7,11,13,17, 19,... 
and then rescales by the map x 1— ► (x — l)/2, one obtains the set 

1,2,3,5,6,8,9,... 

which does not have substantial (mod 2) bias (this is a consequence of the fact that there 
are roughly the same number of primes congruent to 1 and 3 (mod 4)). Furthermore, if 
one can find an arithmetic progression of length k in this set of rescaled primes, one 
can certainly find such a progression in the primes themselves. Unfortunately this set 
of rescaled primes still has linear bias, because it contains only one element = l(mod3). 
However, a similar rescaling trick may be applied to remove this bias too, and so on. 

Here, then, is the VF-trick. Take a slowly growing function w(N) — > 00, and set W : = 
rip<w)(v) P- Define the rescaled von Mongoldt function A by 

A(n) :=£2pA(W» + l). 

9 There are two ways of proving this proposition. One uses classical harmonic analysis. For pointers 
to such a proof, which would involve establishing an L p -restriction theorem for v for some p G (2, 3), we 
refer the reader to |10|. This proof uses more facts about v than mere pseudorandomncss. Alternatively, 
the result may be deduced from Proposition 13 . 51 bv a transference principle using the machinery of [111 
§6-8]. For details of this approach, which is far more amenable to generalisation, see ^2- Note that 
Proposition 14. II does not feature in and is stated here for pedagogical reasons only. 

10 Recall that we are being very hazy in distinguishing between {!,..., N} and Z/NZ. 



LONG ARITHMETIC PROGRESSIONS OF PRIMES 



11 



The normalisation has been chosen so that EA = 1 + o(l). A does not have substantial 
bias in any residue class to modulus q < w(N), and so there is at least hope of applying 
a suitable analogue of Proposition 14. II to it. 

Now it is a straightforward matter to define a new pseudorandom measure v which 
majorises A. Specifically, we have 

(i) (v majorizes the modified primes) We have A(n) ^ 10000z/(n) for all 1 ^ n ^ N. 

(ii) (modified primes sit inside v with positive density) We have Ei< n <jvi/(n) = 
l + o(l). 

The following modified version of Proposition 14.11 may be proved: 

Proposition 4.2. Let a, rj G (0, 2}. Then there is c(a, rj) > with the following propety. 
Let f : Z/7VZ — > IR be a function with Kf = a and such that ^ f(x) ^ 10000z/(x) for 
all x G Z/7VZ, and suppose that 

\T 3 (f)-E 3 (a)\^r). 

Then 

Kez/Nz{f{x) - a)e{rx/N) | ^ c(a, n) (4.2) 
for some r G Z/7VZ. □ 

This may be applied with / = A and a = 1 + o(l). Now, however, condition ()4.2)1 does 
not so obviously hold. In fact, one has the estimate 

sup |E l6Z/iVZ (A(a;) - l)e{rx/N)\ = o(l). (4.3) 

r€l/NZ 

To prove this requires more than simply the good distribution of A in residue classes 
to small moduli. It is, however, a fairly standard consequence of the Hardy-Littlewood 
circle method as applied to primes by Vinogradov. In fact, the whole theme of linear 
bias in the context of additive questions involving primes may be traced back to Hardy 
and Littlewood. 

Proposition 14.21 and (|4.Hjl imply that Tz(A) « -^3(1) = 1- Thus there are infinitely 
many three-term progressions in the modified (W^-tricked) primes, and hence also in the 
primes themselves 11 . 

5. PROGRESSIONS OF LENGTH FOUR AND QUADRATIC BIAS 

We return now to the discussion of There we were interested in counting 3-term 
arithmetic progressions in a set A C G with cardinality aN. In this section our interest 
will be in 4-term progressions. 

Suppose then that A C G is a set, and recall that 

T A (A) := E XideG l A (x)l A (x + d)l A (x + 2d)l A (x + 3d) 

is the normalised count of four-term arithmetic progressions in A. One may, of course, 
ask the analogue of Question 13.11 

11 In fact, this analysis does not have to be pushed much further to get a proof of Coniecture ll.2l for 
k = 3, that is to say an asymptotic for 3-term progressions of primes. One simply counts progressions 
x, x + d, x + 2d by splitting into residue classes x = ^(modW^), d = b'(modW) and using a simple 
variant of Proposition 14.21 
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Question 5.1. Let a G (0, 1), and suppose that A C G is a set with cardinality aN. 
What is T 4 (A)? 

Examples 1,2 and 3 make perfect sense here, and we see once again that there is no 
immediately satisfactory answer to Question 15.11 With high probability the random set 
of Example 1 has about E 4 (a) := a 4 four-term APs, but there are structured sets with 
with substantially more or less than this number of APs. As in JJJ these examples are 
consistent with a dichotomy of the following type: 

Dichotomy 5.2 (Randomness vs Structure for 4-term APs). Suppose that ACG has 
size aN. Then either 

• T 4 {A) « E A {a) or 

• A has structure. 

Taking into account the three examples we have so far, it is quite possible that 
this dichotomy takes exactly the form of that for 3-term APs. That is to say Ll A has 
structure" could just mean that A has linear bias: 

Question 5.3. Let a,rj G (0, 1). Suppose that A C G is a set with \A\ = aN, and that 

\T 4 (A) -E 4 {a)\ 

Must there exist some c = c(a, rj) > and some character 7 G G with the property that 

\(l A -a) A (j)\^c(a, V )? 

That the answer to this question is no, together with the nature of the counterexample, 
is one of the key themes of our whole work. This phenomenon was discovered, in the 
context of ergodic theory, by Furstenberg and Weiss and then again, in the discrete 
setting, by Gowers [Oj. 

Example 4 (Quadratically structured set). Define A C Z/iVZ to be the set of all x such 
that x 2 G [—aN/2,aN/2]. It is not hard to check using estimates for Gauss sums that 
\A\ ~ aN, and also that 

sup \E(l A (x) - a)e(rx/N)\ = o(l), 

reZ/NZ 

that is to say A does not have linear bias. (In fact, the largest Fourier coefficient of 
1a — ot is just N~ l l 2+e .) Note, however, the relation 

x 2 - 3(x + d) 2 + 3(x + 2d) 2 + (x + 3d) 2 = 0, 

valid for arbitrary x, d G Z/iVZ. This means that if x, x + d, x + 2d G A then automat- 
ically we have 

(x + 3c?) 2 G [-7aN/2, laN/2]. 

It seems, then, that if we know that x, x + d and x + 2d lie in A there is a very high 
chance that x + 3d also lies in A. This observation may be made rigorous, and it does 
indeed transpire that T 4 (A) ^ ca 3 . 

How can one rescue the randomness-structure dichotomy in the light of this example? 
Rather remarkably, "quadratic" examples like Example 4 are the only obstructions to 
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having T 4 (v4) « E±{a). There is an analogue of Proposition 13.51 in which characters 7 
are replaced by "quadratic" objects 12 . 

Proposition 5.4 (Too many/few 4APs implies quadratic bias). Let a, 77 G (0, 1). Then 
there is c(a,r)) > with the following property. Suppose that A C G is a set with 
\A\ = aN , and that 

\T 4 (A) - E 4 (a)\ >r). 

Then there is some quadratic object q G Q(«), where k Kq(cx,t]), with the property 
that 

\^x&g(±a(x) - oi)q(x)\ > c(a, 77). □ 

We have not, of course, said what we mean by the set of quadratic objects Q(k). To 
give the exact definition, even for G = Z/iVZ, would take us some time, and we refer to 
[T2"] for a full discussion. In the light of Example 4, the reader will not be surprised to 
hear that quadratic exponentials such as q(x) = e(x 2 /N) are members of Q. However, 
Q(k) also contains rather more obscure objects 13 such as 

q(x) = e{xV2{xV3}) 

and 

q(x) = e(xV2{xV3} + xy5{xV7} + xVTT), 
where {x} denotes fractional part. The parameter k governs the complexity of the 
expressions which are allowed: smaller values of k correspond to more complicated 
expressions. The need to involve these "generalised" quadratics in addition to "genuine" 
quadratics such as e(x 2 /N) was first appreciated by Furstenberg and Weiss in the ergodic 
theory context, and the matter also arose in the work of Gowers. 

6. QUADRATIC BIAS AND THE PRIMES 

It is possible to prove 14 a version of Proposition 15.41 which might be applied to primes. 
The analogue of Proposition 14.11 is true but not useful, for the same reason as before: 
the primes exhibit significant bias in residue classes to small moduli. As before, this 
bias may be removed using the H^-trick. 

Proposition 6.1. Let a,rj G (0,2]. Then there are 0(0,77) and Ko(a,r]) > with 
the following propety. Let f : Z/iVZ — > R be a function with E/ = a and such that 
^ f(x) ^ 10000z/(x) for all x G Z/iVZ, and suppose that 

\Ti(f)-E^a)\>ri. 

Then we have 

%e&/Nz(f{?) - a)q(x)\ ^ c(cn, 77) (6.1) 
for some quadratic object q G Q{k) with K ^ Ko(a,T]). □ 

12 The proof of this proposition is long and difficult and may be found in ^2] ■ It is heavily based on 
the arguments of Gowers [HI E] ■ This proposition has no place in 1 1 1 1 , and it is once again included for 
pedagogical reasons only. It played an important role in the development of our ideas. 

13 We are thinking of these as defined on {1, ... , N} rather than Z/JVZ. 

1 As with Proposition 14. II this proposition does not appear in JT], though it motivated our work 
and a variant of it is used in our later work JJ]. Once again there are two proofs. One is based on a 
combination of harmonic analysis and the work of Gowers, is difficult, and requires more facts about 
v than mere pseudorandomness. This was our original argument. It is also possible to proceed by a 
transference principle, deducing the result from Proposition 15 . 41 using the machinery of [111 §6-8]. See 
[Tl) for more details. 
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One is interested, of course, in applying this with / = A. If we could verify that (|6.1|) 
does not hold, that is to say the primes do not have quadratic bias, then it would follow 
that 74(A) ~ £^4(1) = 1. This means that the modified (VT-tricked) primes have many 
4-term progressions, and hence so do the primes themselves 15 . 

One wishes to show, then, that for fixed k one has 

sup \E xemz (A(x)-l)q(x)\ = o(l). (6.2) 
geQ(/c) 

Such a result is certainly not a consequence of the classical Hardy-Littlewood circle 
method 16 . Generalised quadratic phases such as q(x) = e(x^/2{x\/3}) are particularly 
troublesome. Although we do now have a proof of ()6.2j) . it is very long and complicated. 
See P3! for details. 

In the next section we explain how our original paper ^T] managed to avoid the need 
to prove ()6.2j) . 



7. QUOTIENTING OUT THE BIAS - THE ENERGY INCREMENT ARGUMENT 

Our paper JT] failed to rule out the possibility that A — 1 correlates with some quadratic 
function q G Q{k)- For that reason we did not obtain a proof of Conjecture 11.21 getting 
instead the weaker statement of Theorem 11.31 In this section 1 we outline the energy 
increment argument of ^Tj, which allowed us to deal with the possibility that A — 1 
does correlate with a quadratic. 

We begin by writing 

A:=l+/„. (7.1) 

Proposition 16. II tells us that Xk(A) m 1, unless f correlates with some quadratic q G Q. 
Suppose, then, that 

\^z/Nzfo(x)q (x)\ ^ 7]. 
Then we revise the decomposition (|7.1j) to 

A := F x + fx, (7.2) 

where F\ is a function defined using go- in fact, F\ is basically the average of A over 
approximate level sets of q . That is, one picks an appropriate scale 18 e = 1/J, and 

15 in fact, just as for progressions of length 3, this allows one to obtain a proof of Conjecture II .21 for 
k = 4, that is to say an asymptotic for prime progressions of length 4. See |14| . 

16 Though reasonably straightforward extensions of the circle method do permit one to handle genuine 
quadratic phases such as q(x) = e(x 2 v / 2). 

17 The exposition in this section is rather looser than in other sections. To make the argument 
rigorous, one must introduce various technical devices, such as the exceptional sets which feature in 
[111 §7,8]. We are also being rather vague about the meaning of terms such as "correlate", and the 
parameter k involved in the definition of quadratic object. Note also that the argument of 1 1 1 j uses soft 
quadratic objects rather than the genuine ones which we are discussing here for expositional purposes. 
See ^Hlfor a brief discussion of these. 

18 As we remarked, the actual situation is more complicated. There is an averaging over possible 
decompositions of [0, 1] into intervals of length e, to ensure that the level sets look pleasant. There is 
also a need to consider exceptional sets, which unfortunately makes the argument look rather messy. 
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then defines 

Ft :=E(A|Bo), 

where Bo is the cr-algebra generated by the sets x : qo(x) G [j / J, (j + 1)/J). 

A variant of Proposition 16. II implies a new dichotomy: either Zk(A) ~ T^Fi), or else /i 
correlates with some quadratic c/i G Q. Suppose then that 

\E xeZ/N zfi(x)qi(x)\ ^ T]. 

We then further revise the decomposition (|7.2j) to 

A := F 2 + / 2) 

where now 

F 2 :=E(A\B AB 1 ), 
the cr-algebra being defined by the joint level sets of c/ and q±. 

We repeat this process. It turns out that the algorithm stops in a finite number s of 
steps, bounded in terms of r\. the reason for this is that each new assumption 

\E xeZ/NZ f j (x)q j (x)\ ^ V 

implies an increased lower bound for the energy of A relative to the cr-algebra Bo A • • • A 
Bj-i, that is to say the quantity 

E r .= ||M(A|B A --• AS^OIU- 

The fact that A is dominated by v does, however, provide a universal bound for the 
energy, by dint of the evident inequality 

Ej ^ 10000||E(i7|£o A ••• A£j_i)||2. 
The pseudorandomness of v allows one 19 to bound the right-hand side here by 0(1). 

At termination, then, we have a decomposition 

A = F 8 + f 8 , 

where 

sup \E x&/NZ f s (x)q(x)\ < r), (7.3) 
qeQ 

and F s is defined by 

F s := E(A|£> A £>i A ■ ■ • A B s -i). (7.4) 
A variant of Proposition II implies, together with (|7.Hjl . that 

T 4 (A) « T 4 (F S ). (7.5) 

What can be said about T±(F S )1 Let us note two things about the function F s . First 
of all the definition (|7.4jl implies that 

EF S = EA = 1 +o(l). (7.6) 

19 This deduction uses the machinery of the Gowers C/ 3 -norm, which we do not discuss in this 
survey. See [111 §6] for a full discussion. Of specific relevance is the fact that H^Hc/a = o(l), which is a 
consequence of the pseudorandomness of v. 
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Secondly, F s is not too large pointwise; this is again an artifact of A being dominated 
by v. We have, of course, 

Ili^Hoo = ||E(A|B A Bt A • • • A B a _i)||oo ^ 10000||E(^|H A B x A • •- A £ s _i)|U- 

The pseudorandomness of v can again be used 20 to show that the right-hand side here 
is 10000 + o(l); that is, 

Ili^lU ^ 10000 + o(l). (7.7) 

The two properties ()7.(i|) and ()7.7)1 together mean that F s behaves rather like the char- 
acteristic function of a subset of Z/NZ with density at least 1/10000. This suggests the 
use of Szemeredi's theorem to bound T^Fs) below. The formulation of that theorem 
given in Proposition 12.61 applies to exactly this situation, and it tells us that 

T A {F S ) > c 

for some absolute constant c > 0. Together with ()7.5j) this implies a similar lower bound 
for T4(A), which means that there are infinitely many 4-term arithmetic progressions of 
primes. 

Let us conclude this section with an overview of what it is we have proved. The only 
facts about A that we used were that it is dominated pointwise by 10000z/, and that EA is 
not too small. The argument sketched above applies equally well in the general context 
of functions with these properties, and in the context of an arbitrary pseudorandom 
measure (not just the Goldston-Yildirim measure). 

Proposition 7.1 (Relative Szemeredi Theorem). Let 5 G (0, 1] be a real number and let 
v be a psuedorandom measure. Then there is a constant c'(4, 5) > with the following 
property. Suppose that f : Z/NZ —>■ R is a function such that f(x) ^ u(x) 
pointwise, and for which E/ ^ 5. Then we have the estimate 

T 4 (/) >d(A,S). □ 

In [TT] we prove the same 21 theorem for progressions of any length k ^ 3. 

Proposition 17.11 captures the spirit of our argument quite well. We first deal with 
arithmetic progressions in a rather general context. Only upon completion of that study 
do we concern ourselves with the primes, and this is simply a matter of constructing 
an appropriate pseudorandom measure. Note also that Szemeredi's theorem is used as 
a "black box". We do not need to understand the proof of it, or to have good bounds 
for it. 

Observe that one consequence of Proposition 17.11 is a Szemeredi theorem relative to the 
primes: any subset of the primes with positive relative density contains progressions of 
arbitrary length. Applying this to the set of primes congruent to l(mod4), we see that 
there are arbitrarily long progressions of numbers which are sums of two squares. 



20 A 

gain, the machinery of the Gowers U -norm is used. 
21 Note, however, that the definition of pseudorandom measure is strongly dependent on k. 
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8. SOFT OBSTRUCTIONS 

Readers familiar with may have been confused by our exposition thus far, since 
"quadratic objects" play essentially no role in that paper. The purpose of this brief 
section is to explain why this is so, and to provide a bridge between this survey and our 
paper. Further details and discussion may be found in fTJ §6]. 

Let us start by recalling ^JHl where a set of "obstructions" to a set A C G having roughly 
Es(a) three-term APs was obtained. This was just the collection of characters 7 G G, 
and we used the term linear bias to describe correlation with one of these characters. 

Let / : G — > C be a function with ||/||oo ^ 1- Now we observe the formula 

E aMG f(x + a)f(x + b)f(x + a + b) = £ |/( 7 )| 2 /h)7M, 

which may be verified by straightforward harmonic analysis on G. Coupled with the 
fact that 

El/(7)| a <l, 

a consequence of Parseval's identity, this means that the 22 "dual function" 

V 2 f := E a ^ G f{x + a) f{x + b)f{x + a + b) 

can be approximated by the weighted sum of a few characters. Every character is 
actually equal to a dual function; indeed we clearly have V 2 ipf) = 7. 

We think of the dual functions as soft linear obstructions. They may be used 

in the iterative argument of 33 in place of the genuinely linear functions, after one has 
established certain algebraic closure properties of these functions (see [TT) Proposition 
6.2]) 

The great advantage of these soft obstructions is that it is reasonably obvious how they 
should be generalised to give objects appropriate for the study of longer arithmetic 
progressions. We define 

W) := K,bJ(x + a)f(x + b)f(x + c)f(x+a+b)f(x+a+c)f(x+b+c)f(x + a + b + c). 

This is a kind of sum of / over parallelepipeds (minus one vertex), whereas T> 2 (f) was a 
sum over parallelograms (minus one vertex). This we think of as a soft quadratic obstruc- 
tion. Gone are the complications of having to deal with explicit generalised quadratic 
functions which, rest assured, only become worse when one deals with progressions of 
length 5 and longer. 

The idea of using these soft obstructions came from the ergodic-theory work of Host 
and Kra [IHj, where very similar objects are involved. 

We conclude by emphasising that soft obstructions lead to relatively soft results, such 
as Theorem II. HI To get a proof of Conjecture 11.21 it will be necessary to return to 
generalised quadratic functions and their higher-order analogues. 

22 The subscript 2 refers to the Gowers £/ 2 -norm, which is relevant to the study of progressions of 
length 3. 
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